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When using the bootstrap in the presence of measurement error, 
we must first estimate the target distribution function; we cannot di- 
rectly resample, since we do not have a sample from the target. These 
and other considerations motivate the development of estimators of 
distributions, and of related quantities such as moments and quan- 
tiles, in errors-in-variables settings. We show that such estimators 
have curious and unexpected properties. For example, if the distribu- 
tions of the variable of interest, W, say, and of the observation error 
are both centered at zero, then the rate of convergence of an esti- 
mator of the distribution function of W can be slower at the origin 
than away from the origin. This is an intrinsic characteristic of the 
problem, not a quirk of particular estimators; the property holds true 
for optimal estimators. 

1. Introduction. The problem of nonparametrically estimating a proba- 
bility density, when the data are observed with error, has attracted a great 
deal of interest. However, in a range of circumstances the practical imple- 
mentation of such estimators can be unattractive, since convergence rates 
are slow. Moreover, it is the distribution function, and not the density, that 
is needed in a wide variety of settings. For example, while in conventional 
applications of the bootstrap we proceed by resampling, and do not need 
to compute an empirical distribution function, this approach is infeasible 
when measurement errors are present; instead, we must generate data via a 
distribution-function estimate. 

Therefore, in measurement-error problems, explicit distribution-function 
estimation assumes a substantial degree of importance which it does not 
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necessarily enjoy in other settings. However, distribution-function estima- 
tors enjoy properties very different from those of their density counterparts. 
In particular, root-n consistent estimation of a distribution function is pos- 
sible if the error distribution is not too smooth. We shall give a necessary 
and sufficient condition for there to exist distribution-function estimators 
that converge at rate n _1//2 , and we shall explore, both theoretically and 
numerically, their intriguing properties. For example, we shall show that 
faster convergence rates can be achieved away from the origin than close to 
the origin. 

It would be misleading to treat this problem in isolation; the unusual 
properties of distribution estimators are reflected in estimators of smooth 
functionals of distributions, for example, in quantile and moment estimators. 
However, while estimators in both these settings can be root-n consistent, 
unusual features make the problems intrinsically interesting. In particular, 
while any polynomial moment can be estimated root-n consistently, where 
n denotes sample size, this is not true of fractional moments. In such cases, 
root-n consistency is feasible if and only if the error distribution is not 
smoother than a certain amount, where the latter condition becomes less 
stringent as the exponent of the moment increases. When root-n consistency 
is possible, it can be achieved without any statistical smoothing. In other 
cases, however, smoothing is necessary in order to achieve minimax-optimal 
convergence rates. 

To give a little background to the problem of distribution estimation, we 
mention that, toward the end of his seminal paper on deconvolution density 
estimation, Fan (1991a) explored the distribution-estimation problem. He 
noted that upper bounds, for his particular estimator, and minimax lower 
bounds for arbitrary estimators, could be obtained, but found that they were 
of different orders of magnitude. He conjectured that his upper bound gave 
the optimal rate, and that the lower-bound rate could be increased to that 
of the upper bound. He suggested that the reason for the gap might be that 
the problem is more complex than his two- alternative analysis allowed, and 
that a highly composite-alternative approach could be necessary, as used by 
Stone (1982) in a different problem. 

In fact, the problem is both simpler and more complex than this. It is 
simpler in the sense that a composite-alternative approach is not necessary 
in order to derive optimal rates, but more complex from the viewpoint that, 
apparently unsuspected by previous workers, the distribution- function esti- 
mator converges in an uneven fashion. Specifically, if the distributions of the 
variable of interest, W, say, and of the observation error are both centered 
at zero, then the rate of convergence of an estimator of the distribution 
function of W can be relatively slow near the origin, with the result that 
the rate of convergence uniformly on the real line is an order of magnitude 
slower than the rate in the region {x : \x\ > xq}, for each fixed xq > 0. This 
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remark applies both to the upper bound, for a particular estimator based on 
integrating a density estimator, and the lower bound, for arbitrary estima- 
tors. Therefore, uneven convergence rates are intrinsic to the problem, and 
are not artifacts of either our methodology or our mathematical arguments 
for deriving upper bounds. 

Fan's (1991a) rates are in a slightly different context from ours; he mea- 
sures smoothness in terms of derivatives, whereas we frame it through tail 
behavior of characteristic functions. The latter approach is arguably more 
natural in the present setting, because popular estimators are based on 
Fourier inversion. However, the two approaches can be reconciled closely. 
To the extent that this is possible, and in the context described in the pre- 
vious paragraph, Fan's lower bound gives the optimal rate at the origin, 
although not at other places, while his upper bound is a little larger. 

The context of density estimation has received greatest attention in the 
literature. Early contributions to this topic, suggesting estimators and dis- 
cussing accuracy, include those of Carroll and Hall (1988), Devroye (1989), 
Stefanski and Carroll (1990), Zhang (1990) and Fan (1991a, 1991b, 1993). 
Hesse (1999) and Delaigle and Gijbels (2004a, 2004b) proposed methods for 
smoothing-parameter choice, Koo (1999) introduced a logspline-based de- 
convolution density estimator, Delaigle and Gijbels (2002) and Hesse and 
Meister (2004) discussed methods for estimating density derivatives, and 
van Es, Spreij and van Zanten (2003) treated volatility density estimation. 

Recent contributions to optimality theory, in the context of density es- 
timation, include those of Butucea (2004), who gave minimax convergence 
rates in cases where the unknown density belongs to a class of supersmooth 
functions, and the error distribution is ordinary-smooth; and Butucea and 
Tsybakov (2008), who provided sharp optimality results in settings where 
the unknown density and unknown error distribution are both supersmooth. 
Although practitioners have demonstrated a marked preference for kernel 
methods, wavelet-based deconvolution density estimators have been shown 
to enjoy excellent adaptivity properties. Note, for example, the contributions 
of Pensky and Vidakovic (1999), Fan and Koo (2002) and Pensky (2002), 
who derived convergence rates. 

Groeneboom and Jongbloed (2003) discussed density estimators based on 
nonparametric maximum likelihood estimation when the error has a uniform 
distribution; see also Groeneboom and Wellner (1992). In terms of conver- 
gence rates, our work is more nearly related to these contributions than to 
most others in the setting of density estimation. 

More closely related still are the papers of Booth and Hall (1993), who 
treated interval estimation in errors-in-variables models; Hesse (1995), who 
gave upper bounds to convergence rates of deconvolution distribution esti- 
mators; van de Geer (1995), who addressed estimation of a linear integral 
functional in a mixture model; Cordy and Thomas (1997), who discussed 
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nonparametric estimation of a distribution function when it can be mod- 
eled as a mixture; Jongbloed (1998), who studied isotonic estimation of 
a distribution function; Ioannides and Papanastassiou (2001), who treated 
distribution estimation in the case of dependent data; and Qin and Feng 
(2003) and Cui (2005), who developed asymptotic properties of estimators 
of known functions of the mean of the target distribution. 



2. Methodology. 

2.1. Estimators fw o,nd Fw ■ Assume we observe Xj = Wj + 5j for 1 < j < n, 
where the Wj's and Sj's are independent. If the density fg of 5 is not known, 
then the density fw of W is not identifiable from the Xj's alone. Therefore, 
it is very common (see, e.g., the literature cited in Section 1) to assume a 
form for f$. Only in cases where, for instance, additional data are available 
directly on 5 [Diggle and Hall (1993) and Neumann (1997)], or replicated 
data are available on X, would this assumption be unnecessary. 

A conventional estimator of the density fw of W is given by 

(2.1) f w (x) = f w (x | h) = ^E L ( ^7/^) > 

where 5ft denotes real part, 



(2-2) m = ^zi ^^UTlkdt, 



1_ [°° _ ltu K Ft (t) 
2-kJ-J fP(t/h) 



K is a kernel function (in particular, a function that integrates to 1), K Ft (t) = 
j e ltx K(^ x ^ d x i s Fourier transform, and h > is a smoothing parameter. 
Note that fw is well defined even if fw does not exist. Here and below 
we use the notation /J* and /S, for the characteristic functions of the dis- 
tributions F$ and Fw, without necessarily requiring the existence of the 
respective densities fs or fw- 

Under the common assumption that K Ft is compactly supported and 
/J* does not vanish on the real line, the integral at (2.2) is well defined 
and finite. There is no loss of generality in assuming K is symmetric, and 
seldom any loss in supposing the same for fg . We shall make these simplifying 
assumptions below; they are almost invariably satisfied in practice. Then, L 
is real- valued, and so the symbol 5ft may be dropped from (2.1). 

The estimator Fw is defined as simply the integral of fw over (— oo,x], 
even in cases where fw does not exist. Details concerning its computa- 
tion and interpretation, especially in the case h = 0, will be given in Ap- 
pendix A.l. 
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2.2. Moment estimators. If we wished to estimate a moment of W, say, 
[i T = E(W r ), where r > 1 was an integer, a naive approach would be to base 
the estimator directly on empirical moments of X and the known theoretical 
moments of 5. Since symmetry of F$ implies E{5) = 0, then 

(2.3) fr = j?(r)-t(;)¥)M. 

[Of course, E(d J ) vanishes for odd j.\ Given estimators jlj of fij for 1 < j < 
r — 2, substitution into (2.3) suggests an estimator of fx r , for r > 1: 

1 n r / \ 

(2.4) Ar = -E^"E U ) E Wr-r 

In particular, fix = X = n~ l J2j X{ and ji2 = n _1 J2j — E{5 2 ). 

Exactly the same estimators are obtained using the empirical distribution 
function Fw{- \ 0). That is, if we define 

/oo ^ 
u q dFy/(u | h), 
-oo 

then ju r = jl r for r > 1. 

Provided E(W 2r ) + E(5 2r ) < oo, the estimator /2 r , and hence also /2 r , 
is root-n consistent for /i r . However, root-n consistency is generally not 
possible for estimators of absolute moments, such as u q = E\W\ q , when q > 
is not equal to a positive integer. There is no simple analogue of the estimator 
at (2.4) in this case, although p, r , at (2.5), is readily generalized to 

/oo ^ 
\u\ r dF\v(u | h). 
-oo 

We shall argue in Section 3.4 that, if q is not an even integer, then u q is 
root-n consistent for v q if and only if F$ is sufficiently "rough," expressed, 
for example, in terms of the rate of convergence of /F* to zero in its tails. 
This condition is unnecessary when q is an even integer. 

2.3. Quantile estimators. To estimate the uth quantile, say, £ n = F^ (u), 
where < u < 1 , we first render F\y monotone by defining 

F^{x) = F%™(x | h) = s^{F w {y \h):y<x}, 

and then we put 

L = Uh) = (F^y 1 (u) = sup{y : F^(y) <u} = sup{y : F w (y) < u}. 

Then, £ u is our estimator of £ u . 

The monotonization step serves only to ensure that, with probability 1, 
£ u is well defined. For the choices of bandwidth, and values of u, that we use 
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when establishing properties of £ u , the probability that the monotonization 
step makes no difference to the value of £ u , and, in particular, that ^ u is well 
defined without it, converges to 1 as n increases. In general, the mean-square 
convergence rate of £ u is strictly slower than n , and depends on choice 
of h. 

3. Theory related to optimality. 

3.1. Function classes. Classes of functions indicated by Tj will be sets of 
distributions, Fg, say, of the error random variable 6, while classes denoted 
by Qj will be sets of distributions, Fyy, of W. The positive numbers a and 
(3 will represent bounds to the degrees of the polynomial rates at which 
fg{t)~ l and fv/{t)~ l diverge as \t\ increases. They are generally upper 
bounds in the case of a, and lower bounds in that of (3. 

Given C > 0, write J-i(C) for the class of all distributions Fg for which 
/J* is real-valued and positive everywhere, and 

(3.1) / t- 2 {f!\ty l -i} 2 dt<c. 

Jo 

The integral above is clearly finite on any compact set [0,<o], with to> 0, 
and so (3.1) amounts to a condition on the rate at which the tails of ff l 
approach zero as \t\ increases. In particular, (3.1) can be viewed as holding 
if and only if /^(t) does not converge to zero too quickly as \t\ increases, or 
equivalently, if and only if Fg is not too smooth. 

For example, Fg € J~\{C) for sufficiently large C > 0, if the characteristic 
function satisfies 

(3.2) fP(t)>B(l + \t\)- a 

for some < a < ~ and sufficiently large B > 0. Condition (3.2) is close to 
asserting that Fg has at most a bounded derivatives. A symmetrized Gamma 
distribution, with density 



(j) a (x)= i/j a (x + y)ip a (y) dy 

J \y\ <oo 

for — oo < x < oo, where tp a (x) 

(3.3) 

= T(a/2)- 1 x^- 1 e- x 

for < x < oo and a > 0, 

satisfies both (3.1) and (3.2) provided a < ^. 

Write T 2 {C) for the class of all Fg 6 T\[C) for which E\5\ < C. The 
function classes Tz{pt, C), T^C, q), T^{a,C) and JF 6 (a,C) will be defined 
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concisely in Appendix A. 2. In heuristic terms, ^(a, C) is the class of dis- 
tributions Fg that satisfy (3.2), have a bounded density and bounded first 
absolute moment; and J-±(C, q) is a class of Fg having sufficiently many finite 
moments and for which (3.1) holds but with the integral taken over [l,oo) 
and t~ 2 replaced by t~ 2 ( Q+1 \ The latter constraint increases the permit- 
ted smoothness of Fg, since it allows the tails of /J* to decrease relatively 
quickly. 

The function class F^(a,C) is the set of distributions in ^(a, C) that 
have sufficiently many finite moments. And ^(a, C) is the subset of J- 5 (a, C) 
for which the smoothness conditions on fj* are imposed not just on that 
function but, in an analogous way, on its first two derivatives as well. 

For C > 0, let 0i(C) be the class of distributions Fw that have densities 
fw satisfying sup w fw (w) < C. Write 02(C) for the class of Fw for which 
E\W\ < C. Note that there are distributions in the class 02(C) for which 
fw does not exist. 

The function classes 0z(P,C), G±(P,C), G<z>(C, k), 0e(P,C), 0r(P,C,u,g) 
and 0s(P^C,u, g) will be detailed in Appendix A. 3. Heuristically, the class 
G3(P, C) is close to the set of all Fw that have at least (3 uniformly bounded 
derivatives, and enjoy finite first absolute moment; and Gi(f3, C) is identical, 
except for an analogous smoothness condition on (fw)' rather than on fy^. 
The class Gs(C, k) is a set of Fw's that have a bounded density and bounded 
moments of order 4(k + 1), and Ge(P,C) is a set of Fw's satisfying this 
moment assumption and for which the jth derivative of fw(t) decreases at 
least as fast as (1 + lil) - ^', where < j < 2k + 2. 

In the setting of quantile estimation, a small amount of smoothness in 
the vicinity of the true quantile seems necessary in order to perform the 
distribution inversion. The function classes O7 and 0s ensure this, together 
with, in the case of 0s, constraining the true quantile not to be close to 
the origin. This is necessary in order to tease out the fact that, for quantile 
estimators as well as distribution estimators, convergence rates tend to be 
faster away from the origin than they are close to the origin. 

3.2. Upper bounds to convergence rates for Fw- First we treat the case 
where Fg is particularly "rough," in the sense that its characteristic function 
converges so slowly to zero in the tails that we may use the estimator Fw 
with h arbitrarily small; see Appendix A.l. Results (3.4) and (3.5), below, 
show that in this setting root-??, consistency is possible. A converse to (3.5) 
will be given in Theorem 3.4. 

Theorem 3.1. Assume J \K\ < 00 and f K = I. Then, for each C\,Ci> 

0, 

(3.4) sup sup sup sup nE{Fw(x \ 0) — Fw(x)} 2 < 4C1C2/7T, 

F 5 £ri{C 1 )F w &g 1 {C2)n>l -oo<x<oo 
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/oo 
E{F w (x | 0) - F w (x)} 2 dx < 4(Ci + C 2 ). 
-oo 

Next we treat cases where, in general, choosing a strictly positive value 
of h can be advantageous. For definiteness, we choose K so that K Ft is a 
compactly supported piece of a polynomial: 

j^ Ft (t) = (i-O s i(|t|<i), 

(3.6) 

where r > 2 is an even integer, and s > lis an integer. 

Such kernels are widely used in practice, where they have good numerical 
and theoretical performance; see Delaigle and Hall (2006). They satisfy the 
conditions imposed on K in Theorem 3.1. More general kernels may be used, 
but they generally require stronger conditions defining the function classes. 

Define 1^ = 1 + \ \ogh\ if a = i, and = 1 otherwise. In Theorem 3.2 
below, (3.7) and (3.8) give convergence rates uniformly in all x, and in x 
not close to the origin, respectively. These rates are shown in Theorem 3.5 
to be optimal in the respective cases, if \. Result (3.9) gives the Lo 
convergence rate. 

Theorem 3.2. Assume K satisfies (3.6) with r > + Then, for each 
Ci,C 2 > 0, < h < 1 andn>\, 

sup sup sup E{F\v(x \ h) — Fw(x)} 2 

F s eF 3 (a,Ci) F w eg 3 (/3,C2) -oo<a:<oo 

(3.7) 

<B{h 2f3 + n ~ 1 (l + h^ 2a ~ 1 h h )}, 

sup sup sup E{Fy/{x I h) — F\v{x)} 2 

Fse^i^d) F w ^g 4 (l3,C2) \x\>x 

(3.8) 

< B{xfh 2 P +2 + n~\l + h-^-Vih)}, 

/oo ^ 
E{Fw(x | h) — F\y(x)} 2 dx 
-oo 

(3.9) 

<B{h 2p+1 +n-\l + h-V a - 1 h h )}, 
where, in each case, B > depends only on C\, C2, r, s, a and (3. 

Result (3.7), when < a < \ and f3 = 0, is close to (3.4), although without 
an explicit formula for B on the right-hand side. Note that when < a < \ 
we may take h = in (3.7). 

To exhibit convergence rates, define i = logra if a = 5, and I = 1 if a > ^; 
put h x = h 2 = h 3 = if < a < i, and hj = C(^/n) 1 /(2«+2/3+j-2) if Q > ^ 
where C > 0; define Pj = n~ l if < a < ±; and put Pj = (^/ n )(2/3+j-i)/(2a+2/3+j-2) 
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if a > \. The rates in (3.10), (3.11) and (3.12) below are obtained on taking 
h = h\, /13 and h 2 in (3.7), (3.8) and (3.9), respectively. 

Corollary 3.3. If K satisfies (3.6) with r>(3 + \, and if h\,h 2 ,hz 
are chosen as suggested above, then 

(3.10) sup sup sup E{F\y(x \ h\) — Fw{x)} 2 = O(pi), 
F s e^3(a,Ci)F w eG3(l3,C2)-oo<x<oo 

(3.11) sup sup sup E{F\y(x I /13) — Fw(x)} 2 = O(ps), 
F 5 eT z {oi,Ci) F w eg A {p,c 2 ) \x\>x 

/oo 
E{F w (x I h 2 ) - F w (x)} 2 dx = 0{p 2 ). 
-00 

The rates pi, p 2 and P3 are in the order P3 < p 2 < pi- That is, mean-square 
convergence away from the origin is fastest, followed by convergence of mean 
integrated squared error, followed by mean-square convergence across the 
whole real line. The reason, as we shall show more explicitly in Theorem 3.5 
and in Section 3.3 below, is that the estimator F\y has difficulty in the neigh- 
borhood of the origin, and performs better outside that region. In approxi- 
mate terms, its squared bias is of order h 2 @ within radius 0(h) of the origin, 
and of order /i 2 (^ +1 ) a further distance away. Therefore, the squared-bias 
contribution to mean integrated squared error is of order h(h@) 2 = h 2l3+1 . 

Note, however, that this discussion is predicated on the assumption that 
the distribution of 5 is symmetric, and the distribution of W is in both 
g 3 ((3,C 2 ) and Gi(P,C 2 ). If, for example, /$ = e UBlt (l + B 2 \t\)~P , for real 
Bi and B 2 > 0, then Fw € Gs(P,C 2 ) for sufficiently large C 2 , but Fw does 
not lie in Gi(P, C 2 ) for any C 2 unless Bi = 0. Therefore, a degree of centering 
at zero is being assumed. Of course, if we shift the center of the distribution 
of W to Bi, then the results described in the previous paragraph continue 
to hold if we replace "the origin" by "Bi" throughout. This should be born 
in mind when interpreting discussion below. 

These sizes of squared bias are reflected directly by the first terms on 
right-hand sides of (3.7)-(3.9). Moreover, as is suggested by the second terms 
there, and will be confirmed by the more detailed analysis in Section 3.5, 
error-about-the-mean properties of F\y are very similar near the origin and 
away from the origin; their orders of magnitude do not alter. 

3.3. Lower bounds to convergence rates for F\y. If E\6\ < 00, then (3.1) 
is equivalent to 



(3.13) 



/oo 
t- 2 ff\t)- 2 dt <oo. 
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We know from (3.5) in Theorem 3.1 that, provided E\5\ < oo, (3.13) is suf- 
ficient for root-re consistency of F\\r, in the mean integrated squared error 
sense, uniformly over Fy/ E &(C) for each fixed C > 0. Our next result 
shows that, under a mild additional assumption, (3.13) is also necessary for 
root-n consistency. 

Theorem 3.4. Let F denote any measurable functional of the data. 
If fs i- s °f bounded variation and /J*(i) is nonvanishing and eventually, 
for sufficiently large, positive t, monotone decreasing in t, and if, for some 
C>0, 

/oo 
E{F(x) - F w (x)} 2 dx = 0(n~ 1 ), 
-oo 

then (3.13) holds. 

Next we show that, despite the difficulty that Fw can experience in a 
neighborhood of the origin, it converges there at the minimax-optimal rate. 
Likewise, it has optimal performance away from the origin. In particular, the 
convergence rates at (3.10) and (3.11) are both optimal. In view of what we 
have already learned, it is unsurprising that the rate of convergence of mean 
integrated squared error, in (3.12), is not optimal. Faster convergence rates 
can be achieved by using variable-bandwidth methods, where the bandwidth 
close to the origin is an order of magnitude smaller than that away from the 
origin. 

Recall the definitions Px = n -W(2<H-2/?-i) and p3 = n -(2/3+2)/(2 Q+ 2/3+i) ^ 
appropriate for a > ^. Let £ denote the class of measurable functionals of 
the data Xi, . . . ,X n . Theorem 3.5, below, demonstrates optimality of the 
convergence rates given in (3.7) and (3.8) of Theorem 3.2. 

Theorem 3.5. Let F$ be a distribution for which /J* is real-valued and 
positive everywhere, and |(/,F t )^(i)| < Ci(l + |^ |) — Q!— J for all t and for j = 
0, 1, 2, where C\ > 0. Then, provided a > \, x\ ^ 0, and C2 > is sufficiently 
large, there exists C3 > such that 

inf sup E{F(0) - F w (0)} 2 > C sPl , 
F&S F w £g 3 {/3,C 2 ) 

inf sup EiFix^-Fwix^yC^. 

F£SF w £g 4 (/3,C 2 ) 

3.4. Convergence rates of moment and quantile estimators. Let k > be 
an integer, and q € (2k, 2k + 2). Define £h q = 1 + | log/i| if a = q + ^1 and 
tqh = l otherwise; and put p4 = n -(2/3+2 9 )/(2a+2^-i) _ Result (3 16 ) below 

gives a convergence rate which, when a > q + \ and h = const. n^ 1 ^ 2a+2 ^~ 1 ^ , 
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becomes identical to O(p^); and (3.17) shows that this rate is optimal. In 
that result we interpret v q as a functional of F ES. 

Theorem 3.6, below, is an analogue of Theorems 3.1 and 3.2 in the context 
of estimating the absolute moment u q . It shows that root-n consistency is 
possible, provided the distribution of 5 is sufficiently rough; and it gives 
upper bounds to convergence rates in other cases. 

Theorem 3.6. Let k>0 be an integer, and let 2k < q < 2k + 2. Assume 
K is given by (3.6), with, in the case of (3.15) below, r > 2k + 2, and, for 

(3.16) , r >max(/3 + g,2fc + 2). Then, for each d,C 2 > andforO<h<l, 

(3.15) sup sup E{£> q — v q ) 2 <Cn~ l , 

F s eMCi,q) F W eg 5 (c 2 ,k) 

SUp SUp E(u q — Uq) 2 

F s eF B (a,Ci) F w eg 6 (P,C 2 ) 

(3.16) 

< C {h 2 ^ + n-\l + h-^-^hq)}, 

where C > depends only on C\,C2 and q. Furthermore, if F$ G ^{a^Ci) 
and a > q + ^, then 

(3.17) inf sup E(is q — v q ) 2 > C3P4. 
FeSF w &g 6 (i3,C2) 

Theorem 3.4 has an analogue in this setting, asserting that if E(v q — v q ) 2 = 
0{n~ l ) and F$ satisfies mild additional assumptions, then the integrals at 
(A. 7) (see Appendix A. 2) converge. 

To address the case of quantile estimation, recall from Section 2.3 the 
definitions of £ u and £ u , where < u < 1. Let h±, /13, p\ and p% be as given 
immediately prior to Corollary 3.3, and let the function g satisfy the condi- 
tions in the definition of G*(C,g) in Section 3.1. 

Results (3.18) and (3.19) below give upper bounds to rates of convergence 
for our quantile estimators when the quantile can lie anywhere, or is bounded 
away from the quantile for which £ u = 0, respectively. Results (3.20) and 
(3.21) give lower bounds, complementary to (3.18) and (3.19) respectively, 
in the case of general estimators. 

Theorem 3.7. Assume that a > ^, with in addition a + 2/9 > 2 and 
(3 > 1 in the case of (3.18) and (3.20), respectively; and suppose that K is 
given by (3.6) with r> f3 + \. Then, for each C\,Ci > 0, 

1/2 

(3.18) lim limsup sup sup P{\£ u (hi) — £ u | > p^ A} = 0, 

A-00 „^oo FsGTei^d) F w eg 7 (/3,C2,u,g) 

1/2 

(3.19) lim limsup sup sup -P{|£u(^3) — Cu\ > P3 A} = 0, 

A->oo „^oo F s eFe(<x,Ci) F w eg a (J3,C 2 ,u,g) 
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1/2 

(3.20) liminfliminf inf sup sup P(\€u ~ 6u| > Pi A) > 0, 

AJ.0 n^oo Fse j7 6 ( a>Cl ) F w &g 7 (/3,C2,u,g) 

1/2 

(3.21) liminfliminf inf sup sup P{\£u — £u\ > Pz A) > 0. 

AJ.0 «->oo fi e£ F s eM<x,Ci) F w ega(fi,C 2 ,u,g) 

3.5. Limiting distributions. Under conditions more restrictive than those 
imposed in Section 3.1, it is possible to obtain central limit theorems for 
Fyy, exhibiting the convergence rates discussed in Section 3.2 and having 
explicitly-given biases and variances. The main features of these results 
are as follows: (a) The asymptotic variance equals a constant multiple of 
n~ x h x ~ 2a , where the constant, V(x), say, depends on x; (b) When x = 0, 
the asymptotic bias is a constant multiple, B\, say, of hr; and (c) When 
i^O the bias is asymptotic to ^(/i, x)h^ +1 , where B>2(h,x) is uniformly 
bounded as h 1 0, and exceeds, in absolute value and for arbitrarily small h, 
a fixed constant as h decreases. Formulae for V(x), B\ and B2(h,x) are 
given at (3.22) and (3.23). 

To appreciate the relevance of these results, we interpret them in the 
context of (3.7) and (3.8). Excepting the case a = i, there is a term of 
size n~ l h l ~ 2a on the right-hand sides of both those formulae. This term 
represents the main effect of variance, and is as indicated in (a) above. In 
(3.8), which is for the case of values x that are bounded away from zero, 
there is a term of size h 2 @ +2 on the right-hand side. This represents the 
main effect of squared bias, and (c) above notes that its order of magnitude 
cannot be reduced. In (3.7), which includes the CtlSO X = 0, there is a term of 
size h 2 " on the right-hand side, and as (b) above observes, this too cannot 
be reduced. 

Next we outline regularity conditions that give rise to these explicit ex- 
pansions. Recall that most of the classes of distributions F$ ask that /J* 
decrease no faster than t~ a as t increases. On the present occasion our main 
requirements are that /J*, and its first derivative, have an explicit expan- 
sion in inverses of polynomials up to a degree which strictly exceeds 2a, 
and that f^(t) behave to first order like a constant multiple of t~^ , with 
a remainder that is small enough to permit inversion of the characteristic 
function uniformly in \x\ > xq. These properties are captured by regularity 
conditions (A. 8) and (A. 9), given in Appendix A. 4. 

Theorem 3.8. Assume (A. 8) and (A. 9), that fx is bounded and contin- 
uous at x, and that the bandwidth satisfies h = h(n) — > and nh 2a ~ 1 — > oo. 
Then, Fw(x) — F\y(x) is asymptotically normally distributed with mean 
B\hP + o(h@) or B2(h,x)h@ +l + o(/i^ +1 ), according as x = or x ^ re- 
spectively, where 

b ' , . (-IV 
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(3.22) 

acos(x/h) + bsin(x/h) 



and with variance n 1 h 1 2a V{x), where 

(3.23) V(x) =7T- 2 z 2 f x {x) J^^^p^(l-f) s t a dtj 2 du. 
In (3.22) and (3.23), the integers r and s are as at (3.6). 



4. Numerical properties. 



4.1. Finite- sample performance of the distribution function, absolute mo- 
ment and quantile estimators. In this section we report the results of a 
simulation study illustrating the theoretical results and finite-sample be- 
havior of the estimators of population features considered in Sections 2 
and 3. We consider three distributions for W: (1) T4 7 ~N(0,1), (2) W ~ 
|N(-3,1) + |N(2,1) and (3) W ~ Gamma(2, 1). Distributions (1), (2) and 
a variant of (3) were considered by Delaigle and Gijbels (2004a); see their 
#1, #3 and #2, respectively. Note that (1) gives a unimodal, symmetric 
(about 0) density, (2) a bimodal and two-sided density, and (3) a unimodal, 
one-sided density. Furthermore, the tails of the characteristic functions of 

(1) and (2) decay exponentially fast, while those of (3) decay at a polynomial 
rate. 

For the error distribution, we consider the symmetrized Gamma (a, 1) 
distributions [cf. (3.3)] with a = 2 or a = 6. For each combination of the 
target and error distributions, we consider two different sample sizes, n = 100 
and n = 800, and a range of values of the smoothing parameter h, specifically 
{0.2, 0.4, . . . , 2.0}. In the simulation study for this section we choose the 
kernel K at (3.6), with r = 4 and s = 2. The number of simulation runs used 
in each case is 500. 

The set of x values is {-0.8,0,1.5} for model (1), {-3.0,-0.5,1.5} for 

(2) and x G {0.8,1.5,3.0} for (3). Note that x = is a common point of 
symmetry for W and 5 under model (1), while x = — 3.0 and x = — 0.5 are 
respectively one of the modes of W and the mean (or median) of W under 
model (2). The other x- values are chosen such that each set addresses both 
sides of the median of W. The optimal value of h depends on the level, u, 
of the quantile, although the degree of sensitivity varies from one model to 
another. For all three models considered in our numerical work, the optimal 
h lies in the interval [0.4,0.8] for the coarser error distribution with a = 2, 
and in the interval [1.0, 1.4] when a = 6, reflecting the fact that a larger value 
of h is more appropriate for error variables with a smoother distribution. 
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Figures 1 and 2 give the mean squared errors (MSEs) of the distribution 
function estimator Fy/{x\ti) as a function of h for different values of the 
argument x, under models (1) and (2) respectively. The graphs in the case 
of model (3) are close to those for model (1), provided x = —0.8, and 1.5 
are replaced by x = 0.8, 1.5 and 3.0, respectively. 

The shape of the target distribution (i.e., unimodality versus bimodality) 
also seems to have an effect on the MSE curve, and hence, on the optimal 
value of the smoothing parameter, h. Interestingly, for a = 6, the value 
h = 1.0 is the best choice, among those considered, for all the cc's and n's 
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Fig. 1. MSEs of the distribution function estimator Fw(x\h) under model (1), as a 
function of h € {0.2,0.4, . . . ,2.0}, for x £ {—0.8,0.0, 1.5}. In each panel, the MSE curves 
are marked with circles for x = —0.8, with squares for x = 0.0 and with triangles for 
x — 1.5. The error distribution is given by (3.3) with a e {2,6}. The results are based on 
500 simulation runs. 
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Fig. 2. MSEs of the distribution function estimator Fw(x\h) under model (2), as a 
function of h G {0.2, 0.4, . . . , 2.0}, for x € {—3.0, —0.5, 1.5}. In each panel the MSE curves 
are marked with circles for x — —3.0, with squares for x = —0.5 and with triangles for 
x = 1.5. The error distribution is given by (3.3) with a £ {2,6}. The results are based on 
500 simulation runs. 



under models (1) and (3), and also for x = —0.5 and both the n's under 
model (2). 

Next, we consider the absolute moment estimator u q {h) of Section 2.2, and 
the quantile estimator £ u (h) of Section 2.3. Figures 3 and 4 give the MSE 
functions of u q (h) for q £ {0.5,1,1.5}, and of for u G {0.4,0.5,0.7} 

under model (1). To save space, we omit results for the other two models. 
The range of h values is the same as before, except in the case of the absolute 
moment estimators where h is restricted to a subset of {0.2, 0.4, . . . , 1.4}. For 
h € {1.6,1.8,2.0}, the MSE's of u q (h) become too large (in 100s to 1000s, 
depending on the value of q), and hence, are omitted from the plot. 

Note that the MSE functions for estimating the absolute moments are 
also nicely curved, in all cases attaining their minima, among the values 
of h considered, at h = 1.0. However, the moment estimator v q (h) seems 
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Fig. 3. MSEs of the absolute moment estimator v q (h) under model (1), as a function of 
h, for q £ {0.5, 1.0, 1.5}. In each panel the MSE curves are marked with circles for q = 0.5, 
with squares for q = 1.0 and with triangles for q = 1.5. The error distribution is given by 
(3.3) with ct £ {2,6}. The results are based on 500 simulation runs. 



to be very sensitive to oversmoothing, that is, to choice of too-high values 
of h. In comparison, the MSE functions of the quantile estimator (, u (h) are 
much more stable for under- and over-smoothing. Further, unlike the cases 
of moment and distribution function estimation, estimation of the two lower 
quantiles, u = 0.4 and 0.5, seems to be less sensitive to smoothness of the 
error law; here the best performance is achieved when the values of h are 
small. For the higher quantile, u = 0.7, the optimal h shows dependence on 
the smoothness level of the error distribution, with larger /i-values giving 
better performance. 
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Fig. 4. MSEs of the quantile estimator under model (1), as a function of 

h£ {0.2,0.4, 2.0}, for u£ {0.4,0.5,0.7}. In each panel the MSE curves are marked 
with circles for u = 0.4, with squares for u = 0.5 and with triangles for u = 0.8. The error 
distribution is given by (3.3) with a £ {2,6}. The results are based on 500 simulation runs. 



We next consider the effects of the argument on accuracy of distribution 
function and quantile estimation (cf. Theorems 3.2 and 3.7). Recall that 
under model (1), and under our choice of the symmetric error distribution, 
Theorem 3.2 asserts that the estimator F\\r(x) has a faster optimal rate of 
convergence at a nonzero x compared to that at x = 0. Similar behavior is 
predicted by Theorem 3.7 for the quantile estimator £ u (h) at u ^ 0.5 and 
at u = 0.5. Figure 5 gives boxplots of the differences F\v(x) — Fyv(x) at 
x = —0.8, 0, 1.5 and £ u (h) — £, u at u = 0.4, 0.5, 0.7 under model (1) and a = 2. 
The smoothing parameter for each value of the argument in Fw{x) and 
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Fig. 5. Box-plots of the deviations of the distribution function estimates and quantile 
estimates from their target values under model (1). In each of the top two panels the three 
box-plots correspond to the difference Fw(x\h) — Fw(x) for x £ {—0.8,0.0,1.5}, respec- 
tively, and in the lower two panels, to — £u for u £ {0.4, 0.5, 0.8}, respectively. Here, 
the h-values are set at the respective optimal levels given in Figures 1 and 4- The results 
are based on 500 simulation runs. 



(, u (h) is chosen to be the corresponding optimal value from Figures 1 and 4, 
respectively. 

It is evident from Figure 5 that in the case of estimating the distribution 
function, the = shows maximum variability around the target value 

at both sample sizes n = 100 and n = 800. The lower panels of Figure 5 
show a similar pattern for the median estimator, u = 0.5, compared to the 
quantile estimators with u = 0.4 and 0.7. The pronounced negative bias in 
the case of the median estimator reflects the difficulty of estimating the dis- 
tribution function at the median [in the case of model (1)], and of estimating 
the median itself, relative to estimation at other places. See Theorem 3.7. 
However, the extent of the bias is greater than we had anticipated. 
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4.2. Empirical choice of bandwidth. We shall modify the "normal refer- 
ence" approach, suggested by Delaigle and Gijbels (2004a) in the setting 
of density, rather than distribution, estimation. In particular, we shall tem- 
porarily take fw to be a normal N(0, crjy) density, with a^y = v&r(W) = 
var(A) — var(<5); and compute an estimator Oyy of afy as the variance of 
the data JQ, minus the known variance of 5. (The optimal bandwidth is 
invariant under changes to the location of Fyy.) 

To implement this approach, we shall use the following account of mean 
integrated squared error of the estimator Fyy(- \ h); see Appendix A. 4 for 
regularity conditions. 

Theorem 4.1. If (A. 10) and (A. 11) hold, then, asn^oo andh^-0, 

/oo 
E{F w (x | h) -F w (x)} 2 dx 
-oo 

(4.1) 

= n~ 1 /(/ l ) + B w h 4 + oin^h 1 " 20 + /i 4 ), 

where 

2i,I(h) = J t~ 2 {l-K Ft (ht)/fP(t)} 2 dt, B w = \k 2 J(f{ v ) 2 

and K2 = f x 2 K(x)dx, and a denotes the exponent of decay of f^it). 

We may compute 1(h) by numerical integration. Alternatively, it can be 
approximated as 1(h) ~ Ash 1 ' 2 ", where A s = C 2 k/it, k = f t>0 t 2a ~ 2 K Ft (t) 2 dt, 
and the constants C and it are as in the asymptotic relation, /J*(i) ~ Ct~ a 
as t — ► oo. 

Bandwidth choice involves replacing 1(h) by its known value, for a par- 
ticular h (or using the approximation noted just above); replacing B\y by 
its estimator, B\y = jk^(Att 1 / 2 o - ^) -1 ; and selecting h by minimizing the re- 
sulting approximation to the sum of the first two terms on the right-hand 
side of (4.1). Note that, in the normal case, Rw = J(fw) 2 = (^ 7t1 ^ 2(7 w)~ 1 
and so can be approximated by R\y = (^ l ^ 2 ^w)~ l ■ ^ n the results discussed 
below we used the exact value of 1(h). 

We now report the results of a simulation study designed to investi- 
gate finite sample properties of this empirical bandwidth-selection proce- 
dure. We consider three distributions for W as described above, namely, 
(1) W ~ N(0, 1), (2) W ~ |N(-3, 1) + ±N(2, 1), (3) W ~ Gamma(2, 1), and 
symmetrized Gamma (a, 1) distributions [cf. (3.3)] with a = l and a = 5 for 
the error distribution. Table 1 gives the theoretically optimal bandwidths 
obtained by minimizing the MISE in (4.1). Models (1) and (3) are seen to 
require almost identical amounts of smoothing, with model (2) needing a 
little more. 
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Table 1 

Theoretically optimal bandwidths for distribution function deconvolution 



(a,n) = 


(1,100) 


(1,800) 


(5,100) 


(5,800) 


Model (1) 


0.18 


0.12 


0.36 


0.31 


Model (2) 


0.21 


0.14 


0.38 


0.33 


Model (3) 


0.17 


0.11 


0.35 


0.30 



Following Delaigle and Gijbels (2004a), we used a one-step iteration method 
to compute Rw] the optimal bandwidth estimator (h, say) minimized the 
resulting estimated MISE function. Table 2 gives the bias and mean squared 
error of h, based on 500 simulation runs. Numerical results not given here, 
for the sake of brevity, show that for all six combinations of the error and 
target distributions, the MSE of h decreased with sample size. Moreover, 
estimation is most accurate for distribution (1). This is likely due to use 
of the "normal reference" in the first step of the iteration. As expected, 
the performance of the method is better for the rougher error distribution 
(a = 1), for all target distributions. 

Next we consider the performance of the distribution-function estimators, 
using integrated squared error (ISE): / {F\y (x\h) — Fyy (x)} 2 dx. Table 3 gives 
values of the bias and the mean squared error of the ISE. Box-plots of 
scaled ISE values are given in Figure 6. In each case the scaling factor is the 
(theoretical) minimum of the MISE function. The distributions of the scaled 
ISE values behave as predicted by the theory for variations in sample size 
and in the smoothness of the error distribution. Further, from the box-plots 
it appears that, out of the three distributions considered here, the bimodal 
case (2) is the most difficult to recover. 

APPENDIX 

A.l. Definition of estimator F(x \ h). The integral of L, the latter de- 
fined at (2.2), is given by j v<u L(v) dv = Li(hu), where, provided K and fg 



Table 2 

The bias and the mean squared error (mse) of the estimated optimal bandwidths h based 
on 500 simulation runs. Here (x) e(d) stands for x x 10 d 



(a,n) = 


(1,100) 


(1,800) 


(5,100) 


(5,800) 




bias 


mse 


bias 


mse 


bias 


mse 


bias mse 


Model (1) 
Model (2) 
Model (3) 


2.5e-2 
1.2e-l 
7.1e-2 


l.le-3 
1.5e-2 
5.4e-3 


1.8e-2 
5.1e-2 
4.0e-2 


3.8e-4 
2.7e-3 
1.6e-3 


6.2e-2 
9.2e-2 
6.8e-2 


2.8e-2 
9.0e-3 
1.3e-2 


9.5e-3 6.6e-4 
7.0e-2 5.1e-3 
4.9e-2 2.5e-3 
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Table 3 

The bias and the mean squared error (rase) of the ISE of deconvolution distribution 
function estimators based on 500 simulation runs. Here (x) e(d) stands for x x 10 



(a,n) = 


(1,100) 


(1,800) 


(5,100) 


(5,800) 




bias 


mse 


bias 


mse 


bias 


mse 


bias mse 


Model (1) 
Model (2) 
Model (3) 


4.4e-3 
1.6e-2 
5.9e-3 


l.le-4 
5.1e-4 
1.4e-4 


5.3e-4 
1.8e-3 
7.5e-4 


1.8e-06 
7.1e-06 
2.5e-06 


1.4e-l 
3.4e-2 
5.1e-2 


1.5e-l 
2.0e-3 
1.0e-2 


2.0e-2 1.5e-3 
7.1e-3 l.le-4 
3.5e-3 1.8e-4 
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Fig. 6. Box-plots of the scaled ISEs of the deconvolution distribution function estimators 
(scaled by the respective minimum MISEs). In each panel the three box-plots correspond 
to models (1), (2) and (3), respectively. The number of simulations was 500 in each case. 
Percentage of outliers falling outside the prescribed ranges of the boxplots are .05% for 
model (2) in the "top, left" panel, 5% for model (1) in the "top, right," none for the 
"bottom, left" and .05% for model (1) in the "bottom, right" panel, respectively. 



are both symmetric functions, 
(A.l) Li(u) = Li(u | h) -- 



1 1 

2 + 2^ 



sintuK Ft (ht) 

~ uw 



(It. 
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Thus, by integrating fyy, even if fw is not well defined, we obtain an esti- 
mator, Fw> of the distribution function, F\y, of W: 

(A.2) F w (x) = F w (x\h)= [ X f w (u)du = -J2L 1 (x-X J ). 

J-oo n j=l 

If K Ft is compactly supported, and ff l does not vanish on the real line, 
then the integral at (A.l) is well defined and finite, provided h^O. However, 
in view of Theorems 3.1 and 3.4, the case h = is of particular interest. Since 
JK = 1 then K Ft (0) = 1, and so it follows from (A.l) that 

1 1 f°° smtu 1 



(A.3) LlM0) = . + -J^—— dt , 

assuming that the integral on the right-hand side exists in the Riemann 
sense. An integration by parts argument shows that, for the integral in (A.3) 
to be Riemann convergent for each it is sufficient that 

(A.4) ff\t) is differentiable and (d/dt^tf^it)} is integrable. 

Reflecting (A.3), we take L\(u | 0) = | when u = 0. 

The models for fg that are commonly used in practice are of Laplace type, 
and there 

l/J^i)! and ^(d/fi^li/J^t)}! are both bounded, both above and below, 
by constant multiples of \t\ , as \t\ increases, 

where a > is a parameter of the model. In this setting, (A.4) holds if and 
only if a < 1, and then (A.3) also prevails. When (A. 5) is true, the constraint 
a < 1 is less constrictive than (3.1), which characterizes root-n consistency 
of Fw(- | 0) for F\y; see Theorems 3.1 and 3.4. Indeed, if (A. 5) holds, then 
(3.1) is true if and only if a < \ . 

Therefore, for the sort of distribution of 5 for which one might practically 
be interested in taking h = in F{x \ h), one can expect the estimator 

1 71 

F w (x\0) = -J2L 1 (x-X j \0) 

n j=i 

to be well defined and finite for each x. More generally, however, provided 
(3.1) obtains, the quantities 

/oo ^ 
{F w (x\0) - F w {x)} 2 dx 
-oo 

are well defined and finite, either in their own right or as limits of their 
counterparts when h > 0, without considering models for which (A.4) holds. 
Existence in their own right follows from the fact that, assuming (3.1), 
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(A. 3) implicitly defines almost-everywhere a function L\(u | 0) — which, 
by Parseval's theorem, is square- integr able. There are several ways of for- 
mally defining this function, for example, as an almost-everywhere limit 
of a subsequence of a sequence of Fourier inverses of compactly-supported 
approximations to the Fourier transform of L\(u | 0) — , or as an almost- 
everywhere limit along a subsequence, as h — * oo, of L\(u \ h) — 

Hence, it is appropriate to discuss the value of, and rate of convergence 
to zero of, both of the quantities at (A. 6), without imposing conditions such 
as (A. 5). Reflecting this point, in the formulation of Theorem 3.1 we do not 
require such assumptions. 

A.2. Classes of potential distributions of 5. Note particularly that all 
the function classes Tj include constraints which prevent ff* from ever 
vanishing if the corresponding distribution lies in that class. Given a, C > 0, 
write J-^{a, C) for the class of continuous distributions F$ for which /J' t is 
real-valued and positive, sup/a < C, E\S\ < C and Cff\t) > (1 + \t\)~ a . (See 
the end of Appendix A. 3 for interpretation of conditions on boundedness of 
densities.) 

Given an integer k > 0, and q € (2k, 2k + 2), let T^{C, q) denote the class of 
Fg for which E(5^ k+1 ^) < C and ff l is real- valued and positive and satisfies 



(A.7) 



+ J™t- 2 ^f!\t)- 2 dt<C. 



[The first part of (A.7) is essentially a moment condition.] Write J-§(a,C) 
for the class of all F s 6 F 3 (a,C) for which £(J 4 ( fc+1 )) < C. Let F 6 (a,C) 
be the set of all F s G F 3 (a,C) for which K/J*) '^)! < C(l + \t\)- a ~ j for 
j = 0,1,2. [Therefore, if Fg € Tq(ol,C), then \fj t \ is bounded above and 
below by constant multiples of (1 + |t|) _Q -] 

A. 3. Classes of potential distributions of W. For /3 > and C > 0, let 

G 3 (P,C) be the class of F w for which \f^{t)\<C{l + \t\)~< 3 and E\W\ < C, 
and let Gi(P, C) be the class of F\y € G 3 {P, C) such that 

^t k f$(t)dt < ° k 



sup(l + uf k sup 

u>0 b|>a;o 



(k/3 - P)\xq\ 



for each xq > and each integer k > (3, where fcg is the least such inte- 
ger. (See two paragraphs below for interpretation of this constraint.) Given 
an integer k > 0, let G$(C,k) be the class of Fyy for which sup fw < C 
and E(W 4( - k+ V) < C; and write Ge(P,C) for the class of F w such that 
< C(l + \t\)-P~ j for <j < 2k + 2, and £(W 4 ( fc+1 )) < C. 
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Let < u < 1, let g : [0, 1] — > [0, oo) be such that g(x) — ► as x j 0, and 
denote by G*(C,g) the class of such that (a) fw exists and is strictly 
positive in Z U (C) = - C" 1 ,^ + C~\ (b) /W(£u) + fw^u)' 1 < C, and 
(c) \fw{x) - fw(y)\ < g(v) f° r an V £ [0) 1] and all ^,2/ ^ 2„(C) with \x - 
y\ <rj. Write G 7 (f3,C,u,g) for the class of all F w e G 3 (p , C) D G*(C , g) , and 
g 8 (P,C,u,g) for the class of all F w G g 4 (/3,C) n £*(C,s) for which > 

c- 1 . 

Next we elucidate some of these function classes. If C > is sufficiently 
large then Gs(P,C) contains (fip, defined at (3.3). To appreciate the sorts of 
distributions that are in G±((3,C), note that in many instances where Fw 
is centered at the origin and Fw g Q^{P,C\), it holds true that, for some 
C 2 > 0, \(fw)'{t)\ < C 2 (l + N)"^" 1 - Consider, for example, the case where 
fyy{t) = (1 + B\t\)~@ with B > 0. In such cases, an integration-by-parts 
argument shows that, for u > 0, 



x [ U e Ux t k f%(t)dt 
Jo 



J tx t k - x {kf${t) + t{f$)'{t)}dt 

Jo 

< {Ci + (kp - PrHCik + C 2 )}(1 + u) k ~P. 

Therefore, F w G C) if C > 2C X + C 2 . 

In the definitions of function classes J-j and Qj , the conditions (a) f$ <C 
or (b) fw < C are imposed only to ensure that (c) sup fx < C. Property (c) 
holds if either (a) or (b) does, and so it is possible to switch the condition 
sup /a < C, in the definition of a function class Tj, to sup fw < C, in the 
definition of whenever both G .7-} and /vp G Qk are assumed. This 
feature allows variants of several of our theorems to be formulated easily; 
those variants will not be discussed explicitly. 

A.4. Regularity conditions for Theorems 3.8 and 4.1. For Theorem 3.8 
we take the kernel to be given by (3.6), with integers r and s as in that 
formula, and assume that 

the distribution of 5 is symmetric about the origin, and for all t > 0, 

fP(t) = z -\\ + tr a + *i(i + t)- ai +... +Zp (i+ t)- a *> + A(t), 

(A.8) 

where 2 < ol < a\ < ■ ■ ■ < a p+ \ , a p+ \ > 2a, z,z%, . . . ,z p are nonzero 

real numbers, and |A^(t)| < const. (1 + ty ap+1 ~ j for j = 0, 1; 

as t — ► 00, fw(t) = (a + ib)t~P + o(t~^), where a, b are real numbers, 
/3 > and, for each xq > and for < k < rs — 1, 



(A-9) 



{l + uf- k sup 

x|>2,'0 

as u — > 00. 



f e itx t k {f^(t) - (a + »6)(1 + 1)"^} d£ 
Jo 
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In Theorem 4.1 we assume that 

K is symmetric and satisfies / K = 1 and „ / z 2 /, (*) da * 0, and 

( A - 10 ) F t 

K is compactly supported; 

E\5\ < oo; for all t, fP(t) / 0; for some a > ~ and C > 0, 
(A.ll) /^(i) ~ Ct~ Q as t -»■ oo; and for some /? > § and Ci > 0, 

F w e&CftCi). 
The condition > | in (A.ll) implies that J(/(^) 2 < oo. 
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